Learning an English - Chinese Lexiconfrom a Parallel

نویسندگان

  • Dekai Wu
  • Xuanyin Xia
چکیده

We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-ltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a signiicance ltering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the rst empirical results of the kind between an Indo-European and non-Indo-European language for any signiicant corpus size with a non-toy vocabulary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Parallel Concordancing in English and Chinese

This paper investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. An English-Chinese parallel corpus was created for use in parallel concordancing -a technique which has been developed to respond to the desire to study language in its natural contexts of use. Specific problems of dealing with Chinese characters ...

متن کامل

Learning an English-chinese Lexicon from a Parallel Corpus

We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is nontrivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manuallyfiltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method t...

متن کامل

A Language Learning System with Automatic Feedback: An Application Based on a English-Chinese Parallel Corpus

The learners of English as a second language generally need many practices on writing English, identification of the mistakes in their English, and feedback hints on how to correct their mistakes. A computer-assisted online system is designed to address these issues in a context of learning from a corpus of parallel English-Chinese corpus of New York Times news articles. In the system, students...

متن کامل

Contrastive connectors in English and Chinese: A case

This comparative study of however and its Chinese counterparts in two translation corpora (the HLM parallel corpus, and the Babel English-Chinese Parallel Corpus) reveals that the Chinese contrastive relations tend to be expressed implicitly (cf. Wang and Zheng 2004) and Chinese contrastive connectors are generally used in sentence initial position, whereas the English contrastive relations ten...

متن کامل

Bilingual Parallel Active Learning Between Chinese and English

Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive perfor-mance. Previous studies on active learning are focused on corpora in one single language or two languages translated from each other. This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994